Self-tuning job scheduling strategies for the resource management of HPC systems and computational grids
نویسنده
چکیده
In this thesis we develop and study self-tuning job schedulers for resource management systems. Such schedulers search for the best solution among the available scheduling alternatives in order to improve the performance of static schedulers. In two domains of real world job scheduling this concept is implemented. First of all, we study the scheduling in resource management software for high performance computing (HPC) systems. Typically, a single scheduling policy like first come first serve is used, although the characteristics of the submitted jobs permanently change. Using a single scheduling policy might induce a performance loss, as other policies might be more suitable for specific job characteristics. We develop a self-tuning scheduler, which automatically checks all implemented policies and switches to the best one. This improves the performance, in terms of increased utilization and decreased waiting time. Secondly, we develop and study an adaptive scheduler for computational grid environments. In such grids, several geographically distributed HPC machines are joined in order to increase the amount of computational power. Grid jobs might be scheduled across multiple machines, so that the communication among the job parts involves slow wide area networks. This often induces an additional communication overhead, which has to be considered by the grid scheduler. Our adaptive grid scheduler considers the slower communication over wide area networks by extending the execution time of such multi-site jobs. The developed adaptive multi-site grid scheduler automatically checks, which of the two options is more beneficial: waiting for enough resources at a single site, or using multiple sites and the slower wide area network immediately. In both cases we use discrete event simulations for evaluating the performance of the developed schedulers. The results for the self-tuning scheduler show, that an increased utilization of the system and a decreased waiting time for the jobs are possible. We think, that such self-tuning schedulers should be used in modern resource management systems for HPC machines. The evaluation of the grid scheduler shows, that in general a combination of many small machines and multi-site scheduling can not perform as well as a single large machine with the same amount of resource. However, the adaptive multi-site scheduler decreases the performance difference significantly. We think that the participation in computational grid environments is beneficial, as larger problems requiring more computational power can be solved.
منابع مشابه
A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability
Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...
متن کاملA Comparison of Job Management Systems in Supporting HPC ClusterTools
This paper compares three most common job management systems and their workings with Sun HPC ClusterTools 3.1. Various aspects such as installation, customization, scheduling and resource control issues are discussed. The three chosen systems are: Load Sharing Facility (LSF), Portable Batch System (PBS) and COmputing in DIstributed Networked Environment (CODINE)/ Global Resource Director (GRD)....
متن کاملA note on new trends in data-aware scheduling and resource provisioning in modern HPC systems
The Big Data era [1,2] poses new challenges as well as significant opportunities for High-Performance Computing (HPC) systems such as how to efficiently turn massively large data into valuable information and meaningful knowledge? It is clear that computationally optimized new data-driven HPC techniques are required for processing Big Data in rapidly-increasing number of applications, such as L...
متن کاملThe Self-Tuning dynP Job-Scheduler
In modern resource management systems for supercomputers and HPC-clusters the job-scheduler plays a major role in improving the performance and usability of the system. The performance of the used scheduling policies (e.g. FCFS, SJF, LJF) depends on the characteristics of the queued jobs. Hence we developed the dynP scheduler family. The basic idea was to change between different scheduling pol...
متن کاملJob Scheduling for Computational Grids
Grid computing is a method to execute computational jobs requiring a significant amount of computing resources and/or large sets of data. Contrary to large heterogeneous distributed systems, a Computational Grid has many independent resource providers with different access policies. In addition to the size of such a Grid, the diversity of those policies leads to a very complex allocation task t...
متن کامل